ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks

نویسنده

  • Kavita Ganesan
چکیده

Evaluation of summarization tasks is extremely crucial to determining the quality of machine generated summaries. Over the last decade, ROUGE has become the standard automatic evaluation measure for evaluating summarization tasks. While ROUGE has been shown to be effective in capturing n-gram overlap between system and human composed summaries, there are several limitations with the existing ROUGE measures in terms of capturing synonymous concepts and coverage of topics. Thus, often times ROUGE scores do not reflect the true quality of summaries and prevents multi-faceted evaluation of summaries (i.e. by topics, by overall content coverage and etc). In this paper, we introduce ROUGE 2.0, which has several updated measures of ROUGE: ROUGE-N+Synonyms, ROUGE-Topic, ROUGETopic+Synonyms, ROUGE-TopicUniq and ROUGE-TopicUniq+Synonyms; all of which are improvements over the core ROUGE measures.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ROUGE: A Package For Automatic Evaluation Of Summaries

ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes measures to automatically determine the quality of a summary by comparing it to other (ideal) summaries created by humans. The measures count the number of overlapping units such as n-gram, word sequences, and word pairs between the computer-generated summary to be evaluated and the ideal summaries created by humans...

متن کامل

Summary Evaluation with and without References

We study a new content-based method for the evaluation of text summarization systems without human models which is used to produce system rankings. The research is carried out using a new content-based evaluation framework called FRESA to compute a variety of divergences among probability distributions. We apply our comparison framework to various well-established content-based evaluation measu...

متن کامل

Multi-Candidate Reduction for Flexible Single-Document Summarization

Sentence compression techniques based on linguistically-motivated syntactic rules have proved effective in single-document summarization tasks. The addition of topic terms yields state-of-the-art performance, according to previous evaluations. Since “trimming” rules must be applied successively, optimal rule ordering presents a challenge. This paper describes the Multi-Candidate Reduction (MCR)...

متن کامل

Multilingual Summarization Evaluation without Human Models

We study correlation of rankings of text summarization systems using evaluation methods with and without human models. We apply our comparison framework to various well-established contentbased evaluation measures in text summarization such as coverage, Responsiveness, Pyramids and ROUGE studying their associations in various text summarization tasks including generic and focus-based multi-docu...

متن کامل

A Methodology for Extrinsic Evaluation of Text Summarization: Does ROUGE Correlate?

This paper demonstrates the usefulness of summaries in an extrinsic task of relevance judgment based on a new method for measuring agreement, Relevance-Prediction, which compares subjects’ judgments on summaries with their own judgments on full text documents. We demonstrate that this is a more reliable measure than previous measures based on gold standards. Because it it more reliable, we are ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018